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Abstract — The Cloud Computing paradigm is providing system 
architects with a new powerful tool for building scalable applica- 
tions. Clouds allow allocation of resources on a "pay-as-you-go" 
model, so that additional resources can be requested during peak 
loads and released after that. However, this flexibility asks for 
appropriate dynamic reconfiguration strategies. In this paper we 
describe SAVER (qoS-Aware workflows oVER the Cloud), a QoS- 
aware algorithm for executing workflows involving Web Services 
hosted in a Cloud environment. SAVER allows execution of 
arbitrary workflows subject to response time constraints. SAVER 
uses a passive monitor to identify workload fluctuations based 
on the observed system response time. The information collected 
by the monitor is used by a planner component to identify the 
minimum number of instances of each Web Service which should 
be allocated in order to satisfy the response time constraint. 
SAVER uses a simple Queueing Network (QN) model to identify 
the optimal resource allocation. Specifically, the QN model is used 
to identify bottlenecks, and predict the system performance as 
Cloud resources are allocated or released. The parameters used 
to evaluate the model are those collected by the monitor, which 
means that SAVER does not require any particular knowledge of 
the Web Services and workflows being executed. Our approach 
has been validated through numerical simulations, whose results 
are reported in this paper. 

I. Introduction 

The emerging Cloud computing paradigm is rapidly gaining 
consensus as an alternative to traditional IT systems, as 
exemplified by the Amazon EC2 (T), Xen (2), IBM Cloud (3), 
and Microsoft Cloud (4j. Informally, the Cloud computing 
paradigm allows computing resources to be seen as a utility, 
available on demand. The term "resource" may represent 
infrastructure, platforms, software, services, or storage. In this 
vision, the Cloud provider is responsible to make the resources 
available to the users as they request it. 

Cloud services can be grouped into three categories j5): 
Infrastructure as a Service (IaaS), providing low-level re- 
sources such as Virtual Machines (VMs) (e.g., Amazon 
EC2 |T]); Platform as a Service (PaaS), providing soft- 
ware development frameworks (e.g., Microsoft Azure |4;|); 
and Software as a Service (SaaS), providing applications (e.g., 
Salesforce.com |6|). 

The Cloud provider has the responsibility to manage the 
resources it provides (being them VM instances, programming 
frameworks or applications) so that the user requirements and 
the desired Quality of Service (QoS) are satisfied. Cloud users 



are usually charged according to the amount of resources 
they consume (e.g., some amount of money per hour of CPU 
usage). In this way, customers can avoid capital expenditures 
by using Cloud resources on a "pay-as-you-go" model. 

Users QoS requirements (e.g., timeliness, availability, secu- 
rity) are usually the result of a negotiation process engaged 
between the resource provider and the user, which culminates 
in the definition of a Service Level Agreement (SLA) concern- 
ing their respective obligations and expectations. Guarantee- 
ing SLAs under variable workloads for different application 
and service models is extremely challenging: Clouds are char- 
acterized by high load variance, and users have heterogeneous 
and competing QoS requirements. 

In this paper we present SAVER (qoS-Aware workflows 
oVER the Cloud), a workflow engine provided as a SaaS. 
The engine allows different types of workflows to be executed 
over a set of Web Services (WSs). Workflows are described 
using some appropriate notations (e.g., using the WS-BPEL |7| 
workflow description language). The workflow engine takes 
care of interacting with the appropriate WSs as described in 
the workflow. 

In our scenario, users can negotiate QoS requirements with 
the service provider; specifically, for each type c of workflow, 
the user may request that the average execution time of the 
whole workflow should not exceed a threshold Once 
the QoS requirements have been negotiated, the user can 
submit any number of workflows of the different types. Both 
the submission rate and the time spent by the workflows on 
each WS can fluctuate over time. 

Traditionally, when deciding the amount of resources to 
be dedicated to applications, service providers considered 
worst-case scenarios, resulting in resource over-provisioning. 
Since the worst-case scenario rarely happens, a static system 
deployment results in a processing infrastructure which is 
largely under-utilized. 

To increase the utilization of resources while meeting the 
requested SLA, SAVER uses an underlying IaaS Cloud to 
provide computational power on demand. The Cloud hosts 
multiple instances of each WS, so that the workload can 
be balanced across the instances. If a WS is heavily used, 
SAVER will increase the number of instances by requesting 
new resources from the Cloud. In this way, the response time 
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of that WS can be reduced, reducing the total execution time 
of workflows as well. SAVER monitors the workflow engine 
and detects when some constraints are being violated. System 
reconfigurations are triggered periodically, when instances are 
added or removed where necessary. 

Despite its conceptual simplicity, the idea above is quite 
challenging to implement in practice. To better illustrate the 
problem, let us consider the situation shown in Fig, [T| which 
is modeled upon a similar example from |'8|. We have three 
Web Services VVi , W2 , W3 which are used by two types of 
workflows. Instances of the first type arrive at a rate of 2 
req/s, and execute operations on Wi, W2 and W3. Instances 
of the second workflow type arrive at a rate of 1 req/s and 
only use Wi and W3. Each WS has a maximum capacity, 
which corresponds to the maximum request rate it can handle. 
Web Services 1 and 3 have a maximum capacity of 2 req/s, 
while WS 2 has a capacity of 3 req/s. 

In Fig. [TJa) the capacity of Wi is exceeded, because the 
aggregate arrival rate (3 req/s) is greater than its processing 
capacity. Thus, a queue of unprocessed invocations of Wi 
builds up, until requests start to timeout and are dropped at 
a rate of 1 req/s. To eliminate the bottleneck, a possible 
solution is to create multiple instances of the bottleneck WS 
on different servers, and balance the load across all instances. 
If we apply this strategy and create two instances of W\, we 
get the situation shown in Fig. |TJb): the aggregate processing 
capacity of Wi is now 4 req/s, and thus Web Service 1 is no 



longer the bottleneck. However, the bottleneck shifts to W3, 
which now sees an aggregate arrival rate of 3 req/s and has 
a capacity of 2 req/s. 

The situation above demonstrates the bottleneck shift phe- 
nomenon: fixing a bottleneck may create another bottleneck 
at a different place. Thus, satisfying QoS constraints on 
systems subject to variable workloads is challenging, because 
identifying the system configuration which satisfies all con- 
straints might involve multiple reconfigurations of individual 
components (in our scenario, adding WS instances). If the 
reconfiguration is implemented in a purely reactive manner, 
each step must be applied sequentially in order to monitor its 
impact and plan for the next step. This is clearly inefficient 
because adaptation would be exceedingly slow. 

In general, the response time at a specific WS depends both 
on the number of instances of that Web Service, and also 
on the intensity of other workload classes (workflow types). 
Thus, a suitable system performance model must be used in 
order to predict the response time of a given configuration. The 
performance model can be used to drive the reconfiguration 
process proactively: different system configurations can be 
evaluated quickly, and multiple reconfiguration steps can be 
planned in advance. SAVER uses a open, multiclass Queueing 
Network (QN) model to represent resource contention by 
multiple independent request flows, which is crucial in our 
scenario. The parameters which are needed to evaluate the QN 
model can be easily obtained by passively monitoring the 
running system. The performance model is used within a 
greedy strategy which identifies an approximate solution to the 
optimization problem minimizing the number of WS instances 
while respecting the SLA. 

Structure of this paper: The remainder of this paper is 
organized as follows. In Section |ll] we review the scientific 
literature and compare SAVER with related works. In Sec- 
tion III we give a precise formulation of the problem we are 



addressing. In Section IV we describe the Queueing Network 
performance model of the Cloud-based workflow engine. 
SAVER will be fully described in Section [V] including the 
high-level architecture and the details of the reconfiguration 
algorithms. The effectiveness of SAVER have been evaluated 
by means of simulation experiments, whose results will be 



discussed in Section VI Finally, conclusions and future works 



are presented in Section VII In order to make this paper 
self-contained without sacrificing clarify, we relegated the 
mathematical details of the analysis of the performance model 
in a separate Appendix. 

II. Related works 

Several research contributions have previously addressed the 
issue of optimizing the resource allocation in cluster-based 
service centers. Recently, with the emerging of virtualiza- 
tion approaches and Cloud computing, additional research on 
automatic resource management has been conducted. In this 
section we briefly review some recent results; some of them 
take advantage of control theory-based feedback loops (9), 



[ [TO) , machine learning techniques fTT) , fl2) , or utility -based 
optimization techniques fl3) , [14|. 

When moving to virtualized environments the resource 
allocation problem becomes even more complex because of 
the introduction of virtual resources fl4) . Several approaches 
have been proposed for QoS and resource management at run- 
time (9), (T5)-|[19). 

The approach presented in 1 15 1 describes a method for 
achieving optimization in Clouds by using performance mod- 
els all along the development and operation of the applications 
running in the Cloud. The proposed optimization aims at max- 
imizing profits in the Cloud by guaranteeing the QoS agreed 
in the SLAs taking into account a large variety of workloads. 
A layered Cloud architecture taking into account different 
stakeholders is presented in [9|. The architecture supports self- 
management based on adaptive feedback control loops, present 
at each layer, and on a coordination activity between the 
different loops. Mistral [ 16) is a resource managing framework 
with a multi-level resource allocation algorithm considering 
reallocation actions based mainly on adding, removing and/or 
migrating virtual machines, and shutdown or restart of hosts. 
This approach is based on the usage of Layered Queuing 
Network (LQN) performance model. It tries to maximize the 
overall utility taking into account several aspects like power 
consumption, performance and transient costs in its reconfig- 
uration process. In p8| the authors present an approach to 
self-adaptive resource allocation in virtualized environments 
based on online architecture-level performance models. The 
online performance prediction allow estimation of the effects 
of changes in user workloads and of possible reconfiguration 
actions. Yazir et al. fl9| introduces a distributed approach 
for dynamic autonomous resource management in computing 
Clouds, performing resource configuration using through Mul- 
tiple Criteria Decision Analysis. 

With respect to these works, SAVER lies in the same 
research line fostering the usage of models at runtime to drive 
the QoS-based system adaptation. SAVER uses an efficient 
modeling and analysis technique that can then be used at 
runtime without undermining the system behavior and its 
overall performance. 

Ferretti et al. propose in fl7| a middleware architecture 
enabling a SLA-driven dynamic configuration, management 
and optimization of Cloud resources and services. The ap- 
proach makes use of a load balancer that distributes the 
workload among the available resources. When the perceived 
QoS deviates from the SLA, the platform is dynamically 
reconfigured by acquiring new resources from the Cloud. 
On the other hand, if resources under-utilization is detected, 
the system triggers a reconfiguration to release those unused 
resources. This approach is purely reactive and considers a 
single-tier application, while SAVER works for an arbitrary 
number of WSs and uses a performance model to plan complex 
reconfigurations in a single step. 

Canfora et al. p0[ describe a QoS-aware service discovery 
and late-binding mechanism which is able to automatically 
adapt to changes of QoS attributes in order to meet the SLA. 



The authors consider the execution of workflows over a 
set of WSs, such that each WS has multiple functionally 
equivalent implementations. Genetic Algorithms are use to 
bind each WS to one of the available implementations, so 
that a fitness function is maximized. The binding is done at 
run-time, and depends on the values of QoS attributes which 
are monitored by the system. It should be observed that in 
SAVER we consider a different scenario, in which each WS 
has just one implementation which however can be instantiated 
multiple times. The goal of SAVER is to satisfy a specific QoS 
requirement (mean execution time of workflows below a given 
threshold) with the minimum number of instances. 

III. Problem Formulation 

SAVER is a workflow engine whose general structure is 
depicted in Fig. [2] it receives workflows from external clients, 
and executes them over a set of K WS Wi, • • • , Wk- Work- 
flows can be of C different types (or classes); for each class 
c = 1 , . . . , C, clients define a maximum allowed completion 
time This means that an instance of class c workflow 
must be completed, on average, in time less than New 
workflow classes can be created at any time; when a new class 
is created, its maximum response time is negotiated with the 
workflow service provider. 

We denote with A c the average arrival rate of class c 
workflows. Arrival rates can change over timeQ Since all WSs 
are shared between the workflows, the completion time of a 
workflow depends both on arrival rates A = (Ai, . . . , Ac), and 
on the utilization of each WS. 

In order to satisfy the response time constraints, the system 
must adapt to cope with fluctuations of the workload. To do 
so, SAVER relies on a IaaS Cloud which maintains multiple 
instances of each WS. Run-time monitoring information is sent 
by all WSs back to the workflow engine to drive the adaptation 
process. We denote with Nk the number of instances of WS 
Wk\ a system configuration N = {N\, . . . ,Nk) is an inte- 
ger vector representing the number of allocated instances of 
each WS. 

When a workflow interacts with Wk, it is bound to one of 
the Nk instances so that the requests are evenly distributed. 
When the workload intensity increases, additional instances 
are created to eliminate the bottlenecks; when the workload 
decreases, surplus instances are shut down and released. 

The goal of SAVER is to minimize the total number of WS 
instances while maintaining the mean execution time of type 
c workflows below the threshold c = 1, . . . , C. Formally, 
we want to solve the following optimization problem: 

K 

minimize J(N) = ^7V fe (1) 
fe=i 

subject to i? c (N)<i?+ for all c= 1,2,..., C 
JV ( € {1,2,3,...} 

'in order to simplify the notation, we write A c instead of A c (t). In general, 
we will omit explicit reference to t for all time-dependent parameters. 
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where i? c (N) is the mean execution time of type c workflows 
when the system configuration is N = (N\, . . . , Nk)- 

If the IaaS Cloud which hosts WS instances is managed by 
some third-party organization, then reducing the number of 
active instances reduces the cost of the workflow engine. 

IV. System Performance Model 

Before illustrating the details of SAVER, it is important to 
describe the QN performance model which is used to plan a 

2] using 
A QN 



system reconfiguration. We model the system of Fig. 
the open, multiclass QN model pT) shown in Fig. [3 
model is a set of queueing centers, which in our case are FIFO 
queues attached to a single server. Each server represents a 
single WS instance; thus, Wk is represented by Nk queueing 
centers, for each fc = 1, . . . , K. Nk can change over time, as 
resources are added or removed from the system. 

In our QN model there are C different classes of requests, 
which are generated outside the system. Each request repre- 
sents a workflow, thus workflow types are directly mapped 
to QN request classes. In order to simplify the analysis of 
the model, we make the simplifying assumption that the inter- 
arrival time of class c requests is exponentially distributed with 
arrival rate A c . This means that a new workflow of type c is 
submitted, on average, every 1/A C time units. 

The interaction of a type c workflow with WS Wk is mod- 
eled as a visit of a class c request to one of the Nk queueing 
centers representing Wk- We denote with i? c fe(N) the total 
time (residence time) spent by type c workflows on one of 
the Nk instances of Wk for a given configuration N. The 
residence time is the sum of two terms: the service demand 
D c k(N) (average time spent by a WS instance executing the 
request) and queueing delay (time spent by a request in the 
waiting queue). The QN model allows multiple visits to the 
same queueing center, because the same WS can be executed 
multiple times by the same workflow. The residence time and 
service demands are the sum of residence and service time of 
all invocations of the same WS instance. 

The utilization Uk (N) of an instance of Wk is the fraction of 
time the instance is busy processing requests. If the workload 
is evenly balanced, then both the residence time i? c ^(N) and 
the utilization 17*. (N) are almost the same for all Nk instances 
of W k . 
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Fig. 3. Performance model based on an open, multiclass Queueing Network 

TABLE I 
Symbols used in this paper 

C Number of workflow types 

K Number of Web Services 

A Vector of per-class Arrival rates 

M Current system configuration 

N,N' Arbitrary system configurations 

Rck (N) Residence time of type c workflows on an instance of Wfc 

D c k (N) Service demand of type c workflows on an instance of Wfe 

i? c (N) Response time of type c workflows 

!7fc(N) Utilization of an instance of Wfc 

iic Maximum allowed response time for type c workflows 



Table [I] summarizes the symbols used in this paper. 

V. Architectural Overview of SAVER 

SAVER is a reactive system based on the Monitor-Analyze- 
Plan-Execute (MAPE) control loop shown in Fig. [4] During 
the Monitor step, SAVER collects operational parameters by 
observing the running system. The parameters are evaluate 
during the Analyze step; if the system needs to be reconfigured 
(e.g., because the observed response time of class c workflows 
exceeds the threshold for some c), a new configuration is 
identified in the Plan step. We use the QN model described in 
Section [TV] to evaluate different configurations and identify an 
optimal server allocation such that all QoS constraints are sat- 
isfied. Finally, during the Execute step, the new configuration 
is applied to the system: WS instances are created or destroyed 
as needed by leveraging the IaaS Cloud. Unlike other reactive 
systems, SAVER can plan complex reconfigurations, involving 
multiple additions/removals of resources, in a single step. 

A. Monitoring System Parameters 

The QN model is used to estimate the execution time of 
workflow types for different system configurations. To analyze 
the QN it is necessary to know two parameters: (z) the arrival 
rate of type c workflows, A c , and (z'z) the service demand 
D c k(M) of type c workflows on an instance of WS Wk, for 
the current configuration M. 

The parameters above can be computed by monitoring the 
system over a suitable period of time. The arrival rates A c can 
be estimated by counting the number A c or arrivals of type c 
workflows which are submitted over the observation period of 
length T. Then A c can be defined as A c = A c /T. 
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Fig. 4. SAVER Control Loop 
TABLE II 

Equations for the QN model of Fig. [3] 
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Measuring the service demands D ck (M.) is a bit more 
difficult because they must not include the time spent by a 
request waiting to start service. If the WSs do not provide 
detailed timing information (e.g., via their execution logs), 
it is possible to estimate D c j.(M) from parameters which 
can be easily observed by the workflow engine, that are the 
measured residence time i? c /-(M) and utilization L7fe(M). We 
use the equations shown in Table [II] which hold for the open 
multiclass QN model in Fig. [3j These equations describe well 
known properties of open QN models, so they are given here 
without any proof. The interested reader is referred to [21 1 for 
details. 

The residence time is the total time spent by a type c 
workflow with one instance of WS W k , including waiting time 
and service time. The workflow engine can measure R ck (M.) 
as the time elapsed from the instant a type c workflow sends 
a request to one of the N). instances of W k , to the time the 
request is completed. The utilization U k (M) of an instance 
of Wk can be obtained by the Cloud service dashboard (or 
measured on the computing nodes themselves). Using ^ the 
service demands can be expressed as 



D ck (M) = R ck (M)(l-U k (M)) 



(5) 



B. Finding a new configuration 

In order to find an approximate solution to the optimization 
problem ([TJ, SAVER starts from the current configuration 
M, which may violate some response time constraints, and 
executes Algorithm [T] After collecting device utilizations, 
response times and arrival rates, SAVER estimates the service 
demands D c k using Eq. (BJ. 

Then, SAVER identifies a new configuration N > ]V0by 
calling the function Acquire(). The new configuration N is 
computed by greedily adding new instances to bottleneck WSs. 



Algorithm 1 The SAVER Algorithm 

Require: i£+: Maximum response time of type c workflows 
1: Let M be the initial configuration 

2: loop 

3: Monitor R ck (M), U k (M), A c 

4: for all c := 1, . . . , C; k := 1, . . . , K do 

5: Compute D ck (M) using Eq. Q 

6: N:= Acquire(M,A,D(M),U(M)) 

7: for all c := 1, . . . , C; fc := 1, . . . , K do 

8: Compute L> cfc (N) and U k (N) using Eq. and ([8| 

9: N' := Release(N, A, D(N), U(N)) 
10: Apply the new configuration N' to the system 
11: M :— N' {Set N' as the current configuration M} 



The QN model is used to estimate response times as instances 
are added: no actual resources are instantiated from the Cloud 
service at this time. 

The configuration N returned by the function Acquire() 
does not violate any constraint, but might contain too 
many WS instances. Thus, SAVER invokes the function 
Release() which computes another configuration N' < N 
by removing redundant instances, ensuring that no constraint 
is violated. To call procedure RELEASE() we need to estimate 
the service demands D ck (N) and utilizations <7/c(N) with 
configuration N. These can be easily computed from the 
measured values for the current configuration M. 

After both steps above, N' becomes the new current con- 
figuration: WS instances are created or terminated where 
necessary by acquiring or releasing hosts from the Cloud 
infrastructure. 

Let us illustrate the functions ACQUIRE() and Release() 
in detail. 

a) Adding instances: Function Acquire() is described 
by Algorithm [2] Given the system parameters and config- 
uration N, which might violate some or all response time 
constraints, the function returns a new configuration N' which 
is estimated not to violate any constraint. At each iteration, 
we identify the class b whose workflows have the maximum 
relative violation of the response time limit (line [2|»; response 
times are estimated using Eq. |9| in the Appendix. Then, we 
identify the WS Wj such that adding one more instance to it 
produces the maximum reduction in the class b response time 
(line 13}. The configuration N is then updated by adding one 
instance to Wj (lineQ; the updated configuration is N+ 1^ 
The loop terminates when no workload type is estimated to 
violate its response time constraint. 

Termination of Algorithm [2] is guaranteed by the fact that 
function -R C (N) is monotonically decreasing (Lemma [T] in the 
Appendix). Thus, i? c (N + < R C (N) for all c. 

b) Removing instances: The function Release(), de- 
scribed by Algorithm [3j is used to deallocate (release) WS 
instances from an initial configuration N which does not 



2 N > M iff N k > M k for all k = 1, 



, K 



are set to zero 



Algorithm 2 Acquire(N, A, D(N), U(N)) -> N' 

Require: N System configuration 
Require: A Current arrival rates of workflows 
Require: D(N) Service demands at configuration N 
Require: U(N) Utilizations at configuration N 
Ensure: N New system configuration 
1: while (i? c (N) > R+ for any c) do 

fl c (N) - Rt 



2: 



arg max 



j := argmax{fl b (N) - R b {N + l k ) 

k 

N:= N + l, 



C 



k = 1, 



5: Return N 



violate any response time constraint. The function implements 
a greedy strategy, in which a WS Wj is selected at each step, 
and its number of instances is reduced by one. Reducing the 
number of instances Nj of Wj is not possible if, either (i) the 
reduction would violate some constraint, or (ii) the reduction 
would cause the utilization of some WS instances to become 
greater than one (see Eq. ( fTT) in the Appendix). 

We start by defining the set S containing the index of WSs 
whose number of instances can be reduced without exceed- 
ing the processing capacity (line [3J. Then, we identify the 
workflow class d with the maximum (relative) response time 
(line 13). Finally, we identify the value j £ S such that 
removing one instance of Wj produces the minimum increase 
in the response time of class d workflows (line [6j. The 
rationale is the following. Type d workflows are the most likely 
to be affected by the removal of one WS instance, because 
their relative response time (before the removal) is the highest 
among all workflow types. Once the "critical" class d has been 
identified, we try to remove an instance from the WS j which 
causes the smallest increase of class d response time. Since 
response time increments are additive (see Appendix), if the 
removal of an instance of Wj violates some constraints, no 
further attempt should be done to consider Wj, and we remove 
j from the candidate set S. 

From the discussion above, we observe that function Re- 
LEASE() computes a Pareto -optimal solution N. This means 
that there exists no solution N' < N such that R C (N') < R+. 

VI. Numerical Results 

We performed a set of numerical simulation experiments to 
assess the effectiveness of SAVER; results will be described 
in this section. We implemented Algorithms [T] [2] and [3] using 
GNU Octave [22|, an interpreted language for numerical 
computations. 

In the first experiment we considered K = 10 Web Services 
and C = 5 workflow types. Service demands D ck have been 
randomly generated, in such a way that class c workflows have 
service demands which are uniformly distributed in [0, c/C]. 
Thus, class 1 workflows have lowest average service demands, 
while type C workflows have highest demands. The system 
has been simulated for T — 200 discrete steps t = 1, . . . , T; 



Algorithm 3 Release(N, A, D(N), U(N)) -> N' 

Require: N System configuration 
Require: A Current arrival rates of workflows 
Require: D(N) Service demands at configuration N 
Require: U(N) Utilizations at configuration N 
Ensure: N' New system configuration 



for all k := 1, . . . , K do 

Nmm k := N k Yf c=1 KD ck {T>i) 
S := {k | N k > Nmin k } 
while (S + 0) do 

■ Rt 

a := arg mm 



Rc(N) 



Rc 



■ C 



if 



arg mm { R C (N - l fc ) - Rt \ k G S} 
(Rc(N -lj)> R+ for any c) then 



S:=S\{ 3 } 
else 

N := N - lj 
if (Nj — Nmirij 
S:=S\{ 3 } 
Return N 



{No instance of Wj can be removed} 



then 



Mean resp. time 
Max resp. time 





Fig. 5. Simulation results 



each step corresponds to a time interval of length W, long 
enough to amortize the reconfiguration costs. 

Arrival rates A(t) at step t have been generated according to 
a fractal model, starting from a randomly perturbed sinusoidal 
pattern to mimic periodic fluctuations. Each workflow type has 
a different period. 

Figure [5] shows the results of the simulation. The top part 
of the figure shows the estimated response time i? c (N) (thick 
lines) and upper limit i?+ (thin horizontal lines) for each class 
c = 1 , . . . , C. The middle part of the figure shows the arrival 
rates X c (t) for each class c = 1, . . . , C; note that arrival rates 
have been stacked for clarity, such that the height of each 
individual band corresponds to the value A c (i) from c = 1 
(bottom) to c = 5 (top). The total height of the middle graph 



is the total arrival rate of all workflow types. Finally, each 
band of the bottom part of Figure [5] shows the number Nk of 
instances of WS Wfe, from k = 1 (bottom) to k — 10 (top); 
again, the height of all areas represents the total number of 
resources which are allocated at each simulation step. As can 
be seen, the number of allocated resources closely follows the 
workload pattern. 

We performed additional experiments in order to assess 
the efficiency of allocations produced by SAVER. In par- 
ticular, we are interested in estimating the reduction in the 
number of allocated instances produced by SAVER. To do 
so, we considered different scenarios for all combinations of 
C € {10, 15, 20} workflow types and K e {20, 40, 60} Web 
Services. Each simulation has been executed for T = 200 
steps; everything else (requests arrival rates, service demands) 
have been generated as described above. 



Results are reported in Table III Columns labeled C and K 



show the number of workflow types and Web Services, respec- 
tively. Columns labeled Iter. ACQUIRE^ contain the maximum 
and average number of iterations performed by procedure 
ACQUIRE() (Algorithm |2j; columns labeled Iter. RELEASER 
contain the same information for procedure Release() (Al- 
gorithm [3]). Then, we report the minimum, maximum and 
total number of resources allocated by SAVER during the 
simulation run. Formally, let St denote the total number of WS 
instances allocated at simulation step t; then 



Min. instances 
Max. instances 

Total instances 



min{5 t } 

t 

maxISJ 
t 



E 



Si 



Column labeled WS Instances (static) shows the number of 
instances that would have been allocated by provisioning for 
the worst case scenario; this value is simply T x max t {St}. 
The last column shows the ratio between the total number 
of WS instances allocated by SAVER, and the number of 
instances that would have been allocated by a static algorithm 
to satisfy the worst-case scenario; lower values are better. 

The results show that SAVER allocates between 64%- 
72% of the instances required by the worst-case scenario. 
As previously observed, if the IaaS provider charges a fixed 
price for each instance allocated at each simulation step, then 
SAVER allows a consistent reduction of the total cost, while 
still maintaining the negotiated SLA. 

VII. Conclusions and Future Works 

In this paper we presented SAVER, a QoS-aware algorithm 
for executing workflows involving Web Services hosted in a 
Cloud environment. The idea underlying SAVER is to selec- 
tively allocate and deallocate Cloud resources to guarantee 
that the response time of each class of workflows is less 
than a negotiated threshold. The capability of driving the 
dynamic resource allocation is achieved though the use of a 



feedback control loop. A passive monitor collects information 
that is used to identify the minimum number of instances of 
each WS which should be allocated to satisfy the response 
time constraints. The system performance at different config- 
urations is estimated using a QN model; the estimates are 
used to feed a greedy optimization strategy which produces 
the new configuration which is finally applied to the system. 
Simulation experiments show that SAVER can effectively 
react to workload fluctuations by acquiring/releasing resources 
as needed. 

The methodology proposed in this paper can be improved 
along several directions. In particular, in this paper we as- 
sumed that all requests of all classes are evenly distributed 
across the WS instances. While this assumption makes the 
system easier to analyze and implement, more effective allo- 
cations could be produced if we allow individual workflow 
classes to be routed to specific WS instances. This extension 
would add another level of complexity to SAVER, which at 
the moment is under investigation. We are also exploring the 
use of forecasting techniques as a mean to trigger resource al- 
location and deallocation proactively. Finally, we are working 
on the implementation of our methodology on a real testbed, 
to assess its effectiveness through a more comprehensive set 
of real experiments. 

Appendix 

Let M be the current system configuration; let us assume 
that, under configuration M, the observed arrival rates are A = 
(Ai,...,Ac) and service demands are D ck (M.). Then, for 
an arbitrary configuration N, we can combine Equations ([3]) 
and Q to get: 



fc=i 



l-t/ fe (N) 



(6) 



The current total class c service demand on all instances 
of Wk is M k D ck (M.), hence we can express service demands 
and utilizations of individual instances for an arbitrary config- 
uration N as: 



A*(N) 



D ck {M) 



£4(N) = ^U k (M) 



Thus, we can rewrite |6]) as 



K 



R C (N) = ]T 



fe=i 



£> efc (M)M fc iV fc 
N k - U k (M)M k 



(7) 
(8) 

(9) 



which allows us to estimate the response time i? c (N) of class 
c workflows, given information collected by the monitor for 
the current configuration M. 
From (|2) and |7} we get: 



M k 



£4(N) = -^HJ2 X cD ck (M) 



(10) 



TABLE III 

Simulation results for different scenarios 



Iter. ACQUIRE() Iter. RELEASEQ WS Instances (dynamic) 



c 


K 


max 


avg 


max 


avg 


min 


max 


tot 


WS Instances (static) 


Dynamic/Static 


10 


20 


14 


1.30 


15 


2.53 


36 


127 


16589 


25400 


0.65 


10 


40 


22 


2.43 


19 


3.81 


76 


257 


33103 


51400 


0.64 


10 


60 


35 


3.54 


35 


5.12 


122 


378 


50211 


75600 


0.66 


15 


20 


10 


1.27 


13 


2.56 


78 


178 


23536 


35600 


0.66 


15 


40 


23 


2.20 


26 


3.68 


138 


340 


44843 


68000 


0.66 


15 


60 


34 


3.20 


44 


5.04 


239 


526 


68253 


105200 


0.65 


20 


20 


9 


1.19 


13 


2.50 


114 


206 


28792 


41200 


0.70 


20 


40 


24 


2.33 


29 


4.00 


215 


408 


57723 


81600 


0.71 


20 


60 


21 


3.00 


30 


4.89 


347 


602 


86684 


120400 


0.72 



Since by definition the utilization of any WS instance must 
be less than one, we can use (|T0]> to define a lower bound on 
the number N k of instances of W k as: 

c 

N k >M k ^\ c D ck (M) (11) 

c=l 

The following lemma can be easily proved: 

Lemma 1: The response time function i? c (N) is mono- 
tonically decreasing: for any two configurations N' and N" 
such that N k < N' k ' for all k = l,...,K, we have that 
R C (N') > R C (N") 

Proof: If we extend i? c (N) to be a continuous function, 
its partial derivative is 

dR c = -M%U k (M)D ck (M) 
dN k ( Nk - U k (M)M k ) 2 

which is less than zero for any k for which the utilization 
t/fe(M) and service demand D ck (M.) are nonzero. Hence, 
function i? c (N) is decreasing. ■ 

Note that, according to Eq. (|9j, response time increments 
are additive. This means that i? c (N) — i? c (N + lj) = Aj and 
R c (N)-R c (N+li) = A t imply R C (N) - R C (N = 
At + A, 
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